Automatic Error Analysis for Morphologically Rich Languages

نویسندگان

  • Ahmed El Kholy
  • Nizar Habash
چکیده

This paper presents AMEANA, an opensource tool for error analysis for natural language processing tasks targeting morphologically rich languages. Unlike standard evaluation metrics such as BLEU or WER, AMEANA automatically provides a detailed error analysis that can help researchers and developers better understand the strengths and weaknesses of their systems. AMEANA is easily adaptable to any language provided the existence of a morphological analyzer. In this paper, we focus on usability in the context of Machine Translation (MT) and demonstrate it specifically for English-to-Arabic MT.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Extraction of Morphological Lexicons from Morphologically Annotated Corpora

We present a method for automatically learning inflectional classes and associated lemmas from morphologically annotated corpora. The method consists of a core languageindependent algorithm, which can be optimized for specific languages. The method is demonstrated on Egyptian Arabic and German, two morphologically rich languages. Our best method for Egyptian Arabic provides an error reduction o...

متن کامل

PE2rr Corpus: Manual Error Annotation of Automatically Pre-annotated MT Post-edits

We present a freely available corpus containing source language texts from different domains along with their automatically generated translations into several distinct morphologically rich languages, their post-edited versions, and error annotations of the performed post-edit operations. We believe that the corpus will be useful for many different applications. The main advantage of the approa...

متن کامل

Special Techniques for Constituent Parsing of Morphologically Rich Languages

We introduce three techniques for improving constituent parsing for morphologically rich languages. We propose a novel approach to automatically find an optimal preterminal set by clustering morphological feature values and we conduct experiments with enhanced lexical models and feature engineering for rerankers. These techniques are specially designed for morphologically rich languages (but th...

متن کامل

Error Analysis and Improving Speech Recognition for Latvian Language

Developing a large vocabulary automatic speech recognition system is a very difficult task, due to the high variations in domain and acoustic variability. This task is even more difficult for the Latvian language, which is very rich morphologically and in which one word can have dozens of surface forms. Although there is some research on speech recognition for Latvian, Latvian ASR remains behin...

متن کامل

Arabic Language Modeling with Finite State Transducers

In morphologically rich languages such as Arabic, the abundance of word forms resulting from increased morpheme combinations is significantly greater than for languages with fewer inflected forms (Kirchhoff et al., 2006). This exacerbates the out-of-vocabulary (OOV) problem. Test set words are more likely to be unknown, limiting the effectiveness of the model. The goal of this study is to use t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011